Practice Activity 5: Neural Networks
Predicting Wine Price
Please review the following site for information on our dataset of interest here: https://www.kaggle.com/datasets/dev7halo/wine-information
Your goal is to use the other variables in the dataset to predict wine price. Feel free to use only a subset of the variables.
Assignment Specs
- You should compare Neural Networks as we discussed this week to at least one of our previous models from this quarter.
- A secondary goal of this assignment is to test the effects of the neural network function(s) arguments on the algorithm’s performance.
- You should explore at least 5 different sets of settings for the function inputs, and you should do your best to find values for these inputs that actually change the results of your modelling. That is, try not to run three different sets of inputs that result in the same performance. The goal here is for you to better understand how to set these input values yourself in the future. Comment on what you discover about these inputs and how they behave.
- Additionally, I’d like you to include pictures of the network architecture for each of the neural network models you run. You may hand-draw them and insert pictures into your submitted files if you wish. You may also use software (e.g. draw.io) to create nice looking diagrams. I want you to become intimately familiar with these types of models and what they look like.
- Your submission should be built and written with non-experts as the target audience. All of your code should still be included, but do your best to narrate your work in accessible ways.
- Again, submit an HTML, ipynb, or Colab link. Be sure to rerun your entire notebook fresh before submitting!
The Data
In this activity, we will explore a dataset containing detailed information about wines, including attributes like country, points (rating), province, variety, and winery, among others. Our main goal is to use the available information to predict the price of a wine.
The dataset we are using is sourced from Kaggle and has already undergone some initial cleansing (hence the file name cleansingWine.csv). We will perform further exploration and modeling to understand the patterns and relationships between wine characteristics and their pricing.
To begin, we will load the dataset using pandas and perform a quick initial inspection.
Code
| id | name | producer | nation | local1 | local2 | local3 | local4 | varieties1 | varieties2 | ... | use | abv | degree | sweet | acidity | body | tannin | price | year | ml | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 137197 | Altair | Altair | Chile | Rapel Valley | NaN | NaN | NaN | Cabernet Sauvignon | Carmenere | ... | Table | 14~15 | 17~19 | SWEET1 | ACIDITY4 | BODY5 | TANNIN4 | 220000 | 2014 | 750 |
| 1 | 137198 | Altair, Sideral | Altair | Chile | Rapel Valley | NaN | NaN | NaN | Cabernet Sauvignon | Merlot | ... | Table | 14~15 | 16~18 | SWEET1 | ACIDITY3 | BODY4 | TANNIN4 | 110000 | 2016 | 750 |
| 2 | 137199 | Baron du Val Red | Baron du Val | France | NaN | NaN | NaN | NaN | Carignan | Cinsault | ... | Table | 11~12 | 15~17 | SWEET2 | ACIDITY3 | BODY2 | TANNIN2 | 0 | 0 | 750 |
| 3 | 137200 | Baron du Val White | Baron du Val | France | NaN | NaN | NaN | NaN | Carignan | Ugni blanc | ... | Table | 11~12 | 9~11 | SWEET1 | ACIDITY3 | BODY2 | TANNIN1 | 0 | 0 | 750 |
| 4 | 137201 | Benziger, Cabernet Sauvignon | Benziger | USA | California | NaN | NaN | NaN | Cabernet Sauvignon | NaN | ... | Table | 13~14 | 17~19 | SWEET1 | ACIDITY3 | BODY3 | TANNIN4 | 0 | 2003 | 750 |
5 rows × 31 columns
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 21605 entries, 0 to 21604
Data columns (total 31 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 id 21605 non-null int64
1 name 21605 non-null object
2 producer 21605 non-null object
3 nation 21603 non-null object
4 local1 20705 non-null object
5 local2 11145 non-null object
6 local3 3591 non-null object
7 local4 2 non-null object
8 varieties1 21256 non-null object
9 varieties2 7518 non-null object
10 varieties3 4028 non-null object
11 varieties4 1330 non-null object
12 varieties5 379 non-null object
13 varieties6 105 non-null object
14 varieties7 31 non-null object
15 varieties8 18 non-null object
16 varieties9 7 non-null object
17 varieties10 6 non-null object
18 varieties11 5 non-null object
19 varieties12 4 non-null object
20 type 21547 non-null object
21 use 21591 non-null object
22 abv 14459 non-null object
23 degree 14460 non-null object
24 sweet 21603 non-null object
25 acidity 21592 non-null object
26 body 21592 non-null object
27 tannin 21592 non-null object
28 price 21605 non-null int64
29 year 21605 non-null int64
30 ml 21605 non-null int64
dtypes: int64(4), object(27)
memory usage: 5.1+ MB
Modeling
Our goal is to predict the price of a wine based on a subset of features from the dataset.
To do this, we will: - Build a baseline model using a Bagging Regressor with a Decision Tree estimator. - Build several Neural Networks with different settings to test how changes in the architecture and hyperparameters affect performance.
Our target variable is price.
Feature Selection
For simplicity and clarity, we focus on the following features:
producertypeuseabv(Alcohol by Volume)sweet(Sweetness level)acidity(Acidity level)body(Body level)tannin(Tannin level)year(Vintage year)local1(Local region)varieties1(Grape variety)
These features were chosen because they are intuitively related to wine pricing and were relatively clean after preprocessing. Adding local1 and varieties1 helped capture more variation in wine characteristics, leading to improved model performance.
Preparing Feature Sets
Code
from sklearn.model_selection import train_test_split
# Select only the columns of interest
features = ['producer', 'local1', 'varieties1', 'type', 'use', 'abv', 'sweet', 'acidity', 'body', 'tannin', 'year']
target = 'price'
# Make a copy of the working data
model_data = wine_df[features + [target]].copy()
# Drop any rows with missing values
model_data = model_data.dropna()
# Keep only rows where price is greater than 0
model_data = model_data[model_data['price'] > 0]
# Convert features to appropriate numeric types
def clean_range(value):
""" Helper function to clean values like '14~15' into an average """
if isinstance(value, str) and '~' in value:
low, high = value.split('~')
return (float(low) + float(high)) / 2
try:
return float(value)
except:
return None
for col in ['abv', 'year']:
model_data[col] = model_data[col].apply(clean_range)
# Convert categorical columns like 'sweet', 'acidity', 'body', 'tannin'
# These are text codes like 'SWEET1', so we extract the number
def extract_number(value):
""" Helper to pull numbers out of text labels """
if isinstance(value, str):
return int(''.join(filter(str.isdigit, value)))
return None
for col in ['sweet', 'acidity', 'body', 'tannin']:
model_data[col] = model_data[col].apply(extract_number)
# Drop again any rows with missing values after cleaning
model_data = model_data.dropna()
# Separate X and y
X = model_data[features]
y = model_data[target]
# Train/test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=123)Bagging
To build a strong baseline for comparison against our Neural Networks, we first train a Bagging Regressor. Bagging reduces variance by averaging predictions from multiple decision trees trained on different subsets of the data. We tune key hyperparameters like the number of estimators and tree depth using GridSearchCV. The final model’s MSE and R² scores will provide a benchmark for evaluating the performance of our Neural Networks.
Code
from sklearn.model_selection import GridSearchCV
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.ensemble import BaggingRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_squared_error, r2_score
# Categorical features to OneHotEncode
categorical_features = ['producer', 'local1', 'varieties1', 'type', 'use']
categorical_transformer = OneHotEncoder(drop='first', handle_unknown='ignore')
# Preprocessor
tree_preprocessor = ColumnTransformer(
transformers=[('cat', categorical_transformer, categorical_features)],
remainder='passthrough'
)
# Full pipeline: preprocessing + bagging
pipe_tree = Pipeline(steps=[
('preprocessor', tree_preprocessor),
('regressor', BaggingRegressor(
estimator=DecisionTreeRegressor(random_state=123),
random_state=123
)),
])
# Parameter grid for grid search
param_grid_tree = {
'regressor__n_estimators': [50, 100, 200],
'regressor__estimator__max_depth': [3, 5, 7],
'regressor__estimator__min_samples_split': [2, 5]
}
# Grid search
grid_search = GridSearchCV(pipe_tree, param_grid_tree, cv=5, scoring='neg_mean_squared_error', n_jobs=-1)
grid_search.fit(X_train, y_train)
# Evaluate the best model
best_tree = grid_search.best_estimator_
y_pred = best_tree.predict(X_test)
print(f"Best Parameters: {grid_search.best_params_}")
print(f"Bagging MSE: {mean_squared_error(y_test, y_pred):.2f}")
print(f"Bagging R²: {r2_score(y_test, y_pred):.3f}")Best Parameters: {'regressor__estimator__max_depth': 7, 'regressor__estimator__min_samples_split': 2, 'regressor__n_estimators': 100}
Bagging MSE: 37368762858.38
Bagging R²: 0.392
The Bagging Regressor achieved a Mean Squared Error (MSE) of approximately 37.37 billion and an R² score of 0.392 on the test set. This indicates that the model explains about 39% of the variance in wine prices — a respectable performance given the complexity and variability in wine pricing. By tuning hyperparameters like the number of estimators and tree depth, we were able to strengthen the model’s ability to generalize beyond the training data. While not perfect, Bagging provided a strong baseline for comparing the more complex Neural Network models.
5 Neural Networks for Wine Price Prediction
To explore how Neural Networks perform on our wine price prediction task, we can design five different models, each with a unique architecture or training setting. The goal was to better understand how specific design choices—like the number of neurons, depth of the network, activation functions, and regularization techniques—affect the model’s accuracy and generalization. Each model builds on the one before it, allowing us to observe the effects of increasing complexity or adding stabilization techniques. For each model, we report the test performance and provide a visual diagram of its architecture.
Model 1: Small Simple Network
Model 2: More Neurons
Model 3: Deeper Network
Model 4: Tanh Activation
Model 5: Dropout Regularization
Code
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.compose import ColumnTransformer
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
# 2. Preprocessing: scale numeric features, one-hot encode categoricals
numeric_features = ['abv', 'sweet', 'acidity', 'body', 'tannin', 'year']
preprocessor = ColumnTransformer(
transformers=[
('num', StandardScaler(), numeric_features),
('cat', OneHotEncoder(drop='first', handle_unknown='ignore'), categorical_features)
]
)
# Fit the preprocessor
X_train_prep = preprocessor.fit_transform(X_train)
X_test_prep = preprocessor.transform(X_test)
# 3. Get input shape for model
input_shape = X_train_prep.shape[1]
# Function to compile, train, and evaluate a model
def build_and_train(model, model_name, optimizer='adam', epochs=50, batch_size=32):
model.compile(optimizer=optimizer, loss='mse', metrics=['mae'])
history = model.fit(
X_train_prep, y_train,
epochs=epochs,
batch_size=batch_size,
validation_split=0.2,
verbose=0
)
y_pred = model.predict(X_test_prep).flatten()
mse = mean_squared_error(y_test, y_pred)
mae = mean_absolute_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f"{model_name} | Epochs: {epochs} | Batch Size: {batch_size}")
print(f"Test MSE: {mse:.2f}")
print(f"Test MAE: {mae:.2f}")
print(f"Test R²: {r2:.3f}")
print("-" * 40)
return mse, mae, r2
from itertools import product
def grid_search_nn(model_architecture_fn, model_name, epochs_list, batch_sizes_list, learning_rates_list=None):
results = []
# Handle learning rate (optional)
if learning_rates_list is None:
learning_rates_list = [0.001]
for epochs, batch_size, lr in product(epochs_list, batch_sizes_list, learning_rates_list):
# Rebuild model fresh each time
model = model_architecture_fn()
# Custom optimizer with given learning rate
optimizer = keras.optimizers.Adam(learning_rate=lr)
mse, mae, r2 = build_and_train(
model,
model_name=f"{model_name} (epochs={epochs}, batch={batch_size}, lr={lr})",
optimizer=optimizer,
epochs=epochs,
batch_size=batch_size
)
results.append({
"Model": model_name,
"Epochs": epochs,
"Batch Size": batch_size,
"Learning Rate": lr,
"MSE": mse,
"MAE": mae,
"R2": r2
})
return pd.DataFrame(results)Code
# Model 1: Small Simple Network
def build_model_1():
return keras.Sequential([
layers.Input(shape=(input_shape,)),
layers.Dense(16, activation='relu'),
layers.Dense(1)
])
# Model 2: More Neurons
def build_model_2():
return keras.Sequential([
layers.Input(shape=(input_shape,)),
layers.Dense(64, activation='relu'),
layers.Dense(1)
])
# Model 3: Deeper Network
def build_model_3():
return keras.Sequential([
layers.Input(shape=(input_shape,)),
layers.Dense(32, activation='relu'),
layers.Dense(16, activation='relu'),
layers.Dense(1)
])
# Model 4: Different Activation Function (tanh)
def build_model_4():
return keras.Sequential([
layers.Input(shape=(input_shape,)),
layers.Dense(32, activation='tanh'),
layers.Dense(1)
])
# Model 5: Dropout Regularization
def build_model_5():
return keras.Sequential([
layers.Input(shape=(input_shape,)),
layers.Dense(32, activation='relu'),
layers.Dropout(0.3),
layers.Dense(16, activation='relu'),
layers.Dense(1)
])
# Model 6: Improved Network with Batch Normalization
def build_model_6():
return keras.Sequential([
layers.Input(shape=(input_shape,)),
layers.Dense(64, activation='relu'),
layers.BatchNormalization(),
layers.Dense(32, activation='relu'),
layers.Dense(1)
])Model 1: Small Simple Network (Baseline)
Architecture: Input → Dense(16, relu) → Output(1)
Code
50/50 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step
Model 1: Small Simple Network (epochs=50, batch=16, lr=0.001) | Epochs: 50 | Batch Size: 16
Test MSE: 78538915840.00
Test MAE: 132266.70
Test R²: -0.278
----------------------------------------
50/50 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step
Model 1: Small Simple Network (epochs=50, batch=32, lr=0.001) | Epochs: 50 | Batch Size: 32
Test MSE: 79675752448.00
Test MAE: 135753.27
Test R²: -0.296
----------------------------------------
50/50 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step
Model 1: Small Simple Network (epochs=100, batch=16, lr=0.001) | Epochs: 100 | Batch Size: 16
Test MSE: 70710386688.00
Test MAE: 107798.47
Test R²: -0.151
----------------------------------------
50/50 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step
Model 1: Small Simple Network (epochs=100, batch=32, lr=0.001) | Epochs: 100 | Batch Size: 32
Test MSE: 77643071488.00
Test MAE: 129584.01
Test R²: -0.263
----------------------------------------
The first Neural Network used a simple architecture with just one hidden layer of 16 neurons. Across different training settings, the best result was achieved with 100 epochs and a batch size of 16, but the model still resulted in a negative R² score. This indicates that the model performed worse than simply predicting the average price for every wine. The small size of the network limited its ability to capture complex patterns in the data.
Model 2: More Neurons (Capacity Test)
Architecture: Input → Dense(64, relu) → Output(1)
Code
50/50 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step
Model 2: More Neurons (epochs=50, batch=16, lr=0.001) | Epochs: 50 | Batch Size: 16
Test MSE: 69099921408.00
Test MAE: 103644.16
Test R²: -0.124
----------------------------------------
50/50 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step
Model 2: More Neurons (epochs=50, batch=32, lr=0.001) | Epochs: 50 | Batch Size: 32
Test MSE: 76046262272.00
Test MAE: 124582.50
Test R²: -0.237
----------------------------------------
50/50 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step
Model 2: More Neurons (epochs=100, batch=16, lr=0.001) | Epochs: 100 | Batch Size: 16
Test MSE: 60545134592.00
Test MAE: 92502.96
Test R²: 0.015
----------------------------------------
50/50 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step
Model 2: More Neurons (epochs=100, batch=32, lr=0.001) | Epochs: 100 | Batch Size: 32
Test MSE: 67547275264.00
Test MAE: 100267.10
Test R²: -0.099
----------------------------------------
In this model, we increased the number of neurons to 64 in a single hidden layer. Although adding more neurons slightly reduced the Mean Absolute Error (MAE), the R² score remained mostly negative. This shows that simply making the network wider, without adding depth or other improvements, was not enough to meaningfully capture the complexity of wine pricing.
Model 3: Deeper Network (More Layers)
Architecture: Input → Dense(32, relu) → Dense(16, relu) → Output(1)
Code
50/50 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step
Model 3: Deeper Network (epochs=50, batch=16, lr=0.001) | Epochs: 50 | Batch Size: 16
Test MSE: 49275625472.00
Test MAE: 91354.25
Test R²: 0.198
----------------------------------------
50/50 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step
Model 3: Deeper Network (epochs=50, batch=32, lr=0.001) | Epochs: 50 | Batch Size: 32
Test MSE: 51513749504.00
Test MAE: 95582.19
Test R²: 0.162
----------------------------------------
50/50 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step
Model 3: Deeper Network (epochs=100, batch=16, lr=0.001) | Epochs: 100 | Batch Size: 16
Test MSE: 45317394432.00
Test MAE: 86743.20
Test R²: 0.263
----------------------------------------
50/50 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step
Model 3: Deeper Network (epochs=100, batch=32, lr=0.001) | Epochs: 100 | Batch Size: 32
Test MSE: 49432190976.00
Test MAE: 92049.80
Test R²: 0.196
----------------------------------------
This model added a second hidden layer (32 neurons → 16 neurons). Adding depth led to a significant improvement: we achieved a positive R² score for the first time, meaning the model was better than simply guessing the average. The best performance came from training with 100 epochs and a batch size of 16, showing that giving the model more time to learn and smaller batch updates helped it generalize better.
Model 4: Different Activation (tanh)
Architecture: Input → Dense(32, tanh) → Output(1)
Code
50/50 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step
Model 4: Tanh Activation (epochs=50, batch=16, lr=0.001) | Epochs: 50 | Batch Size: 16
Test MSE: 80635559936.00
Test MAE: 138475.80
Test R²: -0.312
----------------------------------------
50/50 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step
Model 4: Tanh Activation (epochs=50, batch=32, lr=0.001) | Epochs: 50 | Batch Size: 32
Test MSE: 80699146240.00
Test MAE: 138705.20
Test R²: -0.313
----------------------------------------
50/50 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step
Model 4: Tanh Activation (epochs=100, batch=16, lr=0.001) | Epochs: 100 | Batch Size: 16
Test MSE: 80501809152.00
Test MAE: 137992.02
Test R²: -0.310
----------------------------------------
50/50 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step
Model 4: Tanh Activation (epochs=100, batch=32, lr=0.001) | Epochs: 100 | Batch Size: 32
Test MSE: 80629530624.00
Test MAE: 138454.02
Test R²: -0.312
----------------------------------------
We replaced the ReLU activation with tanh in the hidden layer. Across all training settings, this model consistently performed poorly, with high error rates and a negative R². This suggests that ReLU activation was better suited for this dataset, likely because ReLU handles wide-ranging numeric inputs without saturation issues that tanh sometimes suffers from.
Model 5: Add Dropout (Regularization)
Architecture: Input → Dense(32, relu) → Dropout(0.3) → Dense(16, relu) → Output(1)
Code
50/50 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step
Model 5: Dropout Regularization (epochs=50, batch=16, lr=0.001) | Epochs: 50 | Batch Size: 16
Test MSE: 49716576256.00
Test MAE: 91169.36
Test R²: 0.191
----------------------------------------
50/50 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step
Model 5: Dropout Regularization (epochs=50, batch=32, lr=0.001) | Epochs: 50 | Batch Size: 32
Test MSE: 52386275328.00
Test MAE: 96110.25
Test R²: 0.148
----------------------------------------
50/50 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step
Model 5: Dropout Regularization (epochs=100, batch=16, lr=0.001) | Epochs: 100 | Batch Size: 16
Test MSE: 45954666496.00
Test MAE: 86285.38
Test R²: 0.252
----------------------------------------
50/50 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step
Model 5: Dropout Regularization (epochs=100, batch=32, lr=0.001) | Epochs: 100 | Batch Size: 32
Test MSE: 47438626816.00
Test MAE: 88526.82
Test R²: 0.228
----------------------------------------
This network introduced a Dropout layer (30%) to help prevent overfitting. The model achieved stable but slightly lower performance compared to the deeper network without dropout. Dropout helped regularize the network and made it less prone to memorizing the training data, but it slightly limited the model’s ability to fully fit the data. Still, with 100 epochs and a batch size of 16, it achieved a reasonably strong positive R².
Improving Upon Our Best NN
With help from GeeksForGeeks: https://www.geeksforgeeks.org/what-is-batch-normalization-in-deep-learning/
To improve upon our best Neural Network so far, we built a new, slightly larger model with two hidden layers. The first hidden layer has 64 neurons, followed by Batch Normalization to help stabilize and speed up training. The second hidden layer has 32 neurons. We also increased the number of training epochs to 100 to give the model more time to learn the complex relationships in the data, and we used a smaller learning rate (0.001) for finer adjustments during training.
Model 6: Improved Network with Batch Norm
Code
50/50 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step
Model 6: Improved Network with Batch Norm (epochs=50, batch=16, lr=0.001) | Epochs: 50 | Batch Size: 16
Test MSE: 44054331392.00
Test MAE: 78002.59
Test R²: 0.283
----------------------------------------
50/50 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step
Model 6: Improved Network with Batch Norm (epochs=50, batch=32, lr=0.001) | Epochs: 50 | Batch Size: 32
Test MSE: 38779949056.00
Test MAE: 83923.19
Test R²: 0.369
----------------------------------------
50/50 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step
Model 6: Improved Network with Batch Norm (epochs=100, batch=16, lr=0.001) | Epochs: 100 | Batch Size: 16
Test MSE: 47651987456.00
Test MAE: 84262.73
Test R²: 0.225
----------------------------------------
50/50 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step
Model 6: Improved Network with Batch Norm (epochs=100, batch=32, lr=0.001) | Epochs: 100 | Batch Size: 32
Test MSE: 38878797824.00
Test MAE: 84407.50
Test R²: 0.367
----------------------------------------
For our best model, we expanded the network with two hidden layers (64 neurons → 32 neurons) and added Batch Normalization after the first layer. This greatly helped the model stabilize during training and reduced internal covariate shifts. Model 6 achieved the best results overall, with the highest R² score of 0.369, particularly when trained for 50 epochs with a batch size of 32. This shows that thoughtful architectural changes, combined with proper training settings, can lead Neural Networks to perform competitively even compared to ensemble methods like Bagging.
Conclusion
In this activity, we explored different ways to predict wine prices using machine learning models, starting with a Bagging Regressor as a strong baseline and then developing six different Neural Network architectures. Each neural network was built with the goal of understanding how changing specific design choices—like the number of neurons, depth of layers, activation functions, dropout regularization, and batch normalization—impacts model performance. Our final and best-performing Neural Network (Model 6), which included Batch Normalization and two hidden layers, achieved an R² score of 0.369, coming close to our Bagging model’s R² of 0.392. This demonstrates that, with thoughtful design and tuning, Neural Networks can be strong contenders even when compared to traditional ensemble methods.